67 research outputs found
Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning,
function approximation errors are known to lead to overestimated value
estimates and suboptimal policies. We show that this problem persists in an
actor-critic setting and propose novel mechanisms to minimize its effects on
both the actor and the critic. Our algorithm builds on Double Q-learning, by
taking the minimum value between a pair of critics to limit overestimation. We
draw the connection between target networks and overestimation bias, and
suggest delaying policy updates to reduce per-update error and further improve
performance. We evaluate our method on the suite of OpenAI gym tasks,
outperforming the state of the art in every environment tested.Comment: Accepted at ICML 201
Cost Adaptation for Robust Decentralized Swarm Behaviour
Decentralized receding horizon control (D-RHC) provides a mechanism for
coordination in multi-agent settings without a centralized command center.
However, combining a set of different goals, costs, and constraints to form an
efficient optimization objective for D-RHC can be difficult. To allay this
problem, we use a meta-learning process -- cost adaptation -- which generates
the optimization objective for D-RHC to solve based on a set of human-generated
priors (cost and constraint functions) and an auxiliary heuristic. We use this
adaptive D-RHC method for control of mesh-networked swarm agents. This
formulation allows a wide range of tasks to be encoded and can account for
network delays, heterogeneous capabilities, and increasingly large swarms
through the adaptation mechanism. We leverage the Unity3D game engine to build
a simulator capable of introducing artificial networking failures and delays in
the swarm. Using the simulator we validate our method on an example coordinated
exploration task. We demonstrate that cost adaptation allows for more efficient
and safer task completion under varying environment conditions and increasingly
large swarm sizes. We release our simulator and code to the community for
future work.Comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), 201
Bayesian Policy Gradients via Alpha Divergence Dropout Inference
Policy gradient methods have had great success in solving continuous control
tasks, yet the stochastic nature of such problems makes deterministic value
estimation difficult. We propose an approach which instead estimates a
distribution by fitting the value function with a Bayesian Neural Network. We
optimize an -divergence objective with Bayesian dropout approximation
to learn and estimate this distribution. We show that using the Monte Carlo
posterior mean of the Bayesian value function distribution, rather than a
deterministic network, improves stability and performance of policy gradient
methods in continuous control MuJoCo simulations.Comment: Accepted to Bayesian Deep Learning Workshop at NIPS 201
Normalizing Flow Ensembles for Rich Aleatoric and Epistemic Uncertainty Modeling
In this work, we demonstrate how to reliably estimate epistemic uncertainty
while maintaining the flexibility needed to capture complicated aleatoric
distributions. To this end, we propose an ensemble of Normalizing Flows (NF),
which are state-of-the-art in modeling aleatoric uncertainty. The ensembles are
created via sets of fixed dropout masks, making them less expensive than
creating separate NF models. We demonstrate how to leverage the unique
structure of NFs, base distributions, to estimate aleatoric uncertainty without
relying on samples, provide a comprehensive set of baselines, and derive
unbiased estimates for differential entropy. The methods were applied to a
variety of experiments, commonly used to benchmark aleatoric and epistemic
uncertainty estimation: 1D sinusoidal data, 2D windy grid-world (), , and . In these experiments, we setup
an active learning framework and evaluate each model's capability at measuring
aleatoric and epistemic uncertainty. The results show the advantages of using
NF ensembles in capturing complicated aleatoric while maintaining accurate
epistemic uncertainty estimates
Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning
In this paper, we propose a novel model-based multi-agent reinforcement
learning approach named Value Decomposition Framework with Disentangled World
Model to address the challenge of achieving a common goal of multiple agents
interacting in the same environment with reduced sample complexity. Due to
scalability and non-stationarity problems posed by multi-agent systems,
model-free methods rely on a considerable number of samples for training. In
contrast, we use a modularized world model, composed of action-conditioned,
action-free, and static branches, to unravel the environment dynamics and
produce imagined outcomes based on past experience, without sampling directly
from the real environment. We employ variational auto-encoders and variational
graph auto-encoders to learn the latent representations for the world model,
which is merged with a value-based framework to predict the joint action-value
function and optimize the overall training objective. We present experimental
results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges
to demonstrate that our method achieves high sample efficiency and exhibits
superior performance in defeating the enemy armies compared to other baselines.Comment: 14 page
- …